In general, XML is a data storage format that can be
used to define and store data. An XML document is a data storage medium
that lays out the data into elements and attributes in much the same
way that a database has rows and columns. An XML schema defines the
complete data structure and business rules similarly to how a table
definition defines the columns, their types, and the constraints to
which data must conform.
XML
is much more portable than a database and is quickly becoming the
standard for data exchange among websites, applications, and other
implementations that require exchange of data. XML is derived from
HTML, which was designed to provide an application-independent,
character-set–independent method of transferring data, especially
transaction-oriented data, across systems.
Note
XML
can easily be viewed in Internet Explorer. You simply save well-formed
XML into a text file with an .xml extension, and you can then view the
XML within Internet Explorer.
One
of the primary uses for XML is the transfer of data between disparate
systems. Through XML’s definition techniques, when the data is sent,
the complete definition accompanies the data. This means that the data
is always within context and can more easily be mapped to other
definitions of the same data.
Note
SQL
Server had support for XML already, but prior to SQL Server 2005, it
was somewhat difficult to work with. This kept many systems away from
its use because it was too cumbersome and complex to deal with. One of
the biggest issues was the minimal support for XML documents, which is
the heart of what XML is all about.
Newly Supported XML Features
In SQL Server 2005, XML features have grown by leaps and bounds. The new functionality, including an XML
data type, XQuery standards-based query capability, XQuery extensions
to allow for modifications, enhanced functionality of the FOR XML clause, indexing capabilities over XML data, and a SOAP endpoint for better XML transfers over the Internet.
Using the xml Data Type
The new xml
data type provides the previously lacking support for the storage of
XML documents and fragments without dealing with text conversion
issues. You can now store an XML document in its entirety in one field.
You can store the accompanying schema as well, to provide for data
definition. XML instances stored using the xml data type can be associated with an XML schema definition (XSD) to provide the definition and validation. The xml data type can be used in columns, in variables, or as parameters for stored procedures and functions.
Using XQuery with XML
XQuery is a language for querying XML data. When data is stored using the xml
data type, you can use XQuery to dissect the data and pull the required
information out for use in procedures. The SQL Server 2005
implementation of XQuery is based on working drafts of the World Wide
Web Consortium (W3C).
Using Larger XML with xarchar(max)
Another new feature is related to storing XML documents. The new varchar(max) and varbinary(max)
declarations of database fields and procedure variables allow for
increased XML storage. Previously, developers were limited to 8000
bytes, and because many XML documents are larger than that, text fields
were used. Now, with varchar(max),
developers can store XML of virtually any size in a field. This is more
useful if there is not a need to run XQuery against it.
Using XML DML
The
new XML Data Manipulation Language (XML DML) extends the current
definition supplied by the W3C. The current working draft of XQuery
does not include the ability to modify XML documents. In SQL Server
2005, XQuery is extended to include the capability of inserting, updating, and deleting directly in XML documents or document fragments.
Using FOR XML with PATH
In SQL Server 2005, you can nest FOR XML statements to create a hierarchy of documents. The addition of the PATH parameter provides an alternative to the cumbersome EXPLICIT clause. The results of a FOR XML statement can be stored directly in the xml data type, which leads to easier transfers of data.
The
XML data stored in a system is much more efficiently queried if it has
indexing capabilities. Documents and fragments stored in the xml data type can have indexes defined for more effective processing and quicker response times.
Using XML Document Returns
You
can configure HTTP endpoints or addresses to which requests based on
the SOAP standard can be sent. SQL Server can now receive the packets
directly, with no need for middle-tier processing and redirection.
There
is a considerable amount of new XML functionality in SQL Server. To
understand its use, you also have to understand the layout and
architecture behind XML documents.
XML: The Basics
An XML document consists of one or more elements, which are bound between angle brackets (< >).
The first word that appears inside the angle brackets is the name of
the element. The rest of the element consists of element attributes.
For example, here’s an element:
<Customer ID="9" First="Danny" Last="Thomas"/>
The name of this element, or the element type, is Customer. The element has attributes such as ID, First, and Last, which all have values. The element ends with a forward slash and an angle bracket, indicating the end of the element.
An element can also contain other elements, as shown here:
<Customer ID="9" First="Danny" Last="Thomas">
<Sales Qty="4"/>
<Sales Qty="3"/>
</Customer>
In this case, the Customer element contains two Sales
elements. Notice that on the first line, there isn’t a slash before the
ending bracket; the matching slash for this element is on the last
line. This is how objects are nested in XML.
Outputting Data in XML Format
To output data in XML format, you use the SELECT statement with the FOR XML
operator. This tells SQL Server that instead of returning a rowset, it
should return an XML document. There are four different options for
generating the XML: RAW, AUTO, EXPLICIT, and PATH.
Exam Alert
Where is the XML schema? To produce XML output that also contains the schema information for the XML, you must tack the XMLDATA qualifier to the end of the FOR XML clause.
In AUTO
mode, SQL Server returns the rowset in an automatically generated,
nested XML format. If the query has no joins, it doesn’t have a nesting
at all. If the query has joins, it returns the first row from the first
table and then all the correlated rows from each joined table as a
nested level. For example, the following query shows order details
nested inside orders:
SELECT O.OrderID, O.CustomerID, OD.ProductID, OD.UnitPrice, OD.Quantity
FROM Orders AS O
JOIN [Order Details] AS OD ON O.OrderID = OD.OrderID
WHERE O.OrderID < 10251
FOR XML AUTO
<O OrderID="10248" CustomerID="VINET">
<OD ProductID="11" UnitPrice="14.0000" Quantity="12"/>
<OD ProductID="42" UnitPrice="9.8000" Quantity="10"/>
<OD ProductID="72" UnitPrice="34.8000" Quantity="5"/></O>
<O OrderID="10249" CustomerID="TOMSP">
<OD ProductID="14" UnitPrice="18.6000" Quantity="9"/>
<OD ProductID="51" UnitPrice="42.4000" Quantity="40"/></O>
<O OrderID="10250" CustomerID="HANAR">
<OD ProductID="41" UnitPrice="7.7000" Quantity="10"/>
<OD ProductID="51" UnitPrice="42.4000" Quantity="35"/>
<OD ProductID="65" UnitPrice="15.8000" Quantity="15"/></O>
Note that the alias for each table becomes a row identifier within the XML output.
Note
XML
results as shown in the previous query need a unique opening and
closing tag, known as the root tag (that is, <root> and
</root>), to be well formed. Well-formed XML can be displayed within Internet Explorer for ease of viewing.
When
you run the query, the actual XML comes out all on one line, as a
stream of data. XML output does not use linefeeds or make things
readable in any fashion. The easiest way to write queries for XML is to
write them with the FOR XML clause left out, make sure that they are returning the data you want, and then add the FOR XML back onto the end of the query. This eliminates the need for a lot of extra formatting.
The use of the RAW mode of XML output is best suited for situations in which minimal formatting is desired. In RAW
mode, each row is returned as an element with the identifier row.
Here’s an example of the same query you just saw, this time returned in
RAW mode:
SELECT O.OrderID, O.CustomerID, OD.ProductID, OD.UnitPrice, OD.Quantity
FROM Orders AS O
JOIN [Order Details] AS OD ON O.OrderID = OD.OrderID
WHERE O.OrderID < 10251
FOR XML RAW
<row OrderID="10248" CustomerID="VINET"
ProductID="11" UnitPrice="14.0000" Quantity="12"/>
<row OrderID="10248" CustomerID="VINET"
ProductID="42" UnitPrice="9.8000" Quantity="10"/>
<row OrderID="10248" CustomerID="VINET"
ProductID="72" UnitPrice="34.8000" Quantity="5"/>
<row OrderID="10249" CustomerID="TOMSP"
ProductID="14" UnitPrice="18.6000" Quantity="9"/>
<row OrderID="10249" CustomerID="TOMSP"
ProductID="51" UnitPrice="42.4000" Quantity="40"/>
<row OrderID="10250" CustomerID="HANAR"
ProductID="41" UnitPrice="7.7000" Quantity="10"/>
<row OrderID="10250" CustomerID="HANAR"
ProductID="51" UnitPrice="42.4000" Quantity="35"/>
<row OrderID="10250" CustomerID="HANAR"
ProductID="65" UnitPrice="15.8000" Quantity="15"/>
Notice
that the XML output is in an element/attribute association in that each
row of the table is returned as an element, with each column being an
attribute. If you prefer, you can return everything as elements, with
no attributes, using FOR XML RAW, ELEMENTS. New in SQL Server 2005, this provides the following results:
<row><OrderID>10248</OrderID><CustomerID>VINET</CustomerID>
<ProductID>11</ProductID><UnitPrice>14.0000</UnitPrice>
<Quantity>12</Quantity></row>
<row><OrderID>10248</OrderID><CustomerID>VINET</CustomerID>
<ProductID>42</ProductID><UnitPrice>9.8000</UnitPrice>
<Quantity>10</Quantity></row>
<row><OrderID>10248</OrderID><CustomerID>VINET</CustomerID>
<ProductID>72</ProductID><UnitPrice>34.8000</UnitPrice>
<Quantity>5</Quantity></row>
<row><OrderID>10249</OrderID><CustomerID>TOMSP</CustomerID>
<ProductID>14</ProductID><UnitPrice>18.6000</UnitPrice>
<Quantity>9</Quantity></row>
<row><OrderID>10249</OrderID><CustomerID>TOMSP</CustomerID>
<ProductID>51</ProductID><UnitPrice>42.4000</UnitPrice>
<Quantity>40</Quantity></row>
<row><OrderID>10250</OrderID><CustomerID>HANAR</CustomerID>
<ProductID>41</ProductID><UnitPrice>7.7000</UnitPrice>
<Quantity>10</Quantity></row>
<row><OrderID>10250</OrderID><CustomerID>HANAR</CustomerID>
<ProductID>51</ProductID><UnitPrice>42.4000</UnitPrice>
<Quantity>35</Quantity></row>
<row><OrderID>10250</OrderID><CustomerID>HANAR</CustomerID>
<ProductID>65</ProductID><UnitPrice>16.8000</UnitPrice>
<Quantity>15</Quantity></row>
Adding XSINIL to the end of the command tacks the namespace argument to the beginning of the XML.
The EXPLICIT and PATH
options enable you to specify the format of the XML that will be
created. Using these options makes the query more complicated to
formulate, but it gives you a little more control over the output.
To
answer the questions on the 70-431 exam, you really only need to know
the definitions. You will want to dig in much deeper if you continue
with the remaining SQL Server certification exams.
Note
You
are not going to become an XML expert overnight, and the material
presented in this book is not intended to instruct you on XML.
Essentially, all this chapter has done thus far is show how to draw
data out of SQL Server in XML format. It has not tried to explain XML
or produce an XML reference text, but knowing the material presented in
this section and the two that follow should get you through the XML
portion of the 70-431 exam.
Using the FOR XML
clause to view data in XML format is really easy when you get the hang
of it. Getting XML data into a database is a little more tricky and not
as user friendly.